Outline 0 Introduction In this paper…… An overview of the purpose of your analysis including details or references necessary.

1 Overview 1.1 Business Problem 1.2 Business Value Proposition

2 Data Description ◦the source ◦the variables with definitions or a link to a codebook ◦the number of observations in your data set ◦detail on missingness ◦a glimpse of your data if possible (e.g. head and tail)

3 Data Preprocessing ◦Feature generation ◦Imputation ◦Cleaning or merging of categories ◦Outlier removal ◦Anything that changes your data from the original form

4.Your Final Analysis in small pieces with annotation 5.Graphs to visualize different steps in your analysis 6.Clear discussion of why you made analysis choices 7.References to papers or citations you used to make decisions about the analysis

1 Overview

1.1 Business Problem

1.2 Business Value Proposition

2 Data Description

2.1 Source of Data

2.3 A Glimps of Data

wine=readRDS("wine.RDS")
head(wine)

2.4 Dataset Summary

MetaData

library(DataExplorer)
library(kableExtra)
z<-introduce(wine)
z<-as.data.frame(t(z))
colnames(z)<-c()
knitr::kable(
   z,
  caption="Data Introduction"
  ) %>% kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F,
                font_size = 12,
                position = "left")
Data Introduction
rows 6497
columns 13
discrete_columns 0
continuous_columns 13
all_missing_columns 0
total_missing_values 0
complete_rows 6497
total_observations 84461
memory_usage 653376

Histogram

library(DataExplorer)
plot_histogram(wine)

Bar Plot

library(DataExplorer)
plot_bar(wine)

Statistical Summary

summary(wine)
##  fixed.acidity    volatile.acidity  citric.acid     residual.sugar  
##  Min.   : 3.800   Min.   :0.0800   Min.   :0.0000   Min.   : 0.600  
##  1st Qu.: 6.400   1st Qu.:0.2300   1st Qu.:0.2500   1st Qu.: 1.800  
##  Median : 7.000   Median :0.2900   Median :0.3100   Median : 3.000  
##  Mean   : 7.215   Mean   :0.3397   Mean   :0.3186   Mean   : 5.443  
##  3rd Qu.: 7.700   3rd Qu.:0.4000   3rd Qu.:0.3900   3rd Qu.: 8.100  
##  Max.   :15.900   Max.   :1.5800   Max.   :1.6600   Max.   :65.800  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide    density      
##  Min.   :0.00900   Min.   :  1.00      Min.   :  6.0        Min.   :0.9871  
##  1st Qu.:0.03800   1st Qu.: 17.00      1st Qu.: 77.0        1st Qu.:0.9923  
##  Median :0.04700   Median : 29.00      Median :118.0        Median :0.9949  
##  Mean   :0.05603   Mean   : 30.53      Mean   :115.7        Mean   :0.9947  
##  3rd Qu.:0.06500   3rd Qu.: 41.00      3rd Qu.:156.0        3rd Qu.:0.9970  
##  Max.   :0.61100   Max.   :289.00      Max.   :440.0        Max.   :1.0390  
##        pH          sulphates         alcohol         quality     
##  Min.   :2.720   Min.   :0.2200   Min.   : 8.00   Min.   :3.000  
##  1st Qu.:3.110   1st Qu.:0.4300   1st Qu.: 9.50   1st Qu.:5.000  
##  Median :3.210   Median :0.5100   Median :10.30   Median :6.000  
##  Mean   :3.219   Mean   :0.5313   Mean   :10.49   Mean   :5.818  
##  3rd Qu.:3.320   3rd Qu.:0.6000   3rd Qu.:11.30   3rd Qu.:6.000  
##  Max.   :4.010   Max.   :2.0000   Max.   :14.90   Max.   :9.000  
##    wine.type     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.2461  
##  3rd Qu.:0.0000  
##  Max.   :1.0000

Statistical Summary

library(DataExplorer)
plot_correlation(wine, type = "c")

2.5 Missing Data

library(DataExplorer)
plot_missing(wine)

There are no missing data present in this dataset

3 Data Preprocessing

3.1 Merging Datasets

wineRed = read.csv("winequality-red.csv",sep = ";")
wineWhite = read.csv("winequality-white.csv",sep = ";")
wineRed$wine.type <- 1
wineWhite$wine.type <- 0
dim(wineRed)
## [1] 1599   13
dim(wineWhite)
## [1] 4898   13
wine = rbind(wineRed, wineWhite)
dim(wine)
## [1] 6497   13

MetaData For Red Wine

library(DataExplorer)
library(kableExtra)
z<-introduce(wineRed)
z<-as.data.frame(t(z))
colnames(z)<-c()
knitr::kable(
   z,
  caption="Data Introduction"
  ) %>% kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F,
                font_size = 12,
                position = "left")
Data Introduction
rows 1599
columns 13
discrete_columns 0
continuous_columns 13
all_missing_columns 0
total_missing_values 0
complete_rows 1599
total_observations 20787
memory_usage 163576

MetaData For White Wine

library(DataExplorer)
library(kableExtra)
z<-introduce(wineWhite)
z<-as.data.frame(t(z))
colnames(z)<-c()
knitr::kable(
   z,
  caption="Data Introduction"
  ) %>% kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F,
                font_size = 12,
                position = "left")
Data Introduction
rows 4898
columns 13
discrete_columns 0
continuous_columns 13
all_missing_columns 0
total_missing_values 0
complete_rows 4898
total_observations 63674
memory_usage 493472

MetaData For Wine

library(DataExplorer)
library(kableExtra)
z<-introduce(wine)
z<-as.data.frame(t(z))
colnames(z)<-c()
knitr::kable(
   z,
  caption="Data Introduction"
  ) %>% kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = F,
                font_size = 12,
                position = "left")
Data Introduction
rows 6497
columns 13
discrete_columns 0
continuous_columns 13
all_missing_columns 0
total_missing_values 0
complete_rows 6497
total_observations 84461
memory_usage 653376

3.2 Principle Component Analysis (for feature selection)

4 Analysis and Modeling

4.1 One number

pressure
paste("The mean pressure is:", round(mean(pressure$pressure),3), "mm")

[1] “The mean pressure is: 124.337 mm”

4.2 A table

library(knitr)
kable(head(pressure), format="pipe", digit=3)
temperature pressure
0 0.000
20 0.001
40 0.006
60 0.030
80 0.090
100 0.270
kable(tail(pressure), format="pipe", digit=3)
temperature pressure
14 260 96
15 280 157
16 300 247
17 320 376
18 340 558
19 360 806

5 Including Plots

You can also embed plots, for example:

Plotly

Map

For more details on organizing with tabset go here https://bookdown.org/yihui/rmarkdown-cookbook/html-tabs.html.

6 Equations

You can include both inline and offset equations.

6.1 Inline equations

You can include inline equations like \(y = nx + b\), you can also do more complicated inline equations such as \(\hat{y} = \hat{\beta} + \hat{\beta_1}x\)

6.2 Offset equations

7 References